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Abstract. We discuss the methods of Evans and Moshonov [Bayesian 
Analysis 1 (2006) 893-914, Bayesian Statistics and Its Applications 
(2007) 145-159] concerning checking for prior-data conflict and their 
relevance to the method proposed in this paper. 
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1. INTRODUCTION 

This is an interesting paper dealing with an im- 
portant topic. It is a logical continuation of the con- 
tributions found in Bayarri and Berger (2000). In 
particular, it continues the emphasis on avoiding the 
"double use of the data" and this is an important 
point that we agree with. 

While it seems intuitively clear what "double use 
of the data" means, it would be nice to have a pre- 
cise definition as the phrase seems to be used a bit 
too freely by some at times, at least in our view. In- 
tuitively, in model checking, this would seem to be 
the situation where the fitted model depends on a 
particular aspect of the data and then the model is 
checked by comparing the same aspect of the data 
with the fitted model. On the other hand, we have 
seen assertions that a "double use of the data" is 
being made in situations like computing a posterior 
(the first use) and then (the second use) comput- 
ing a characteristic of that distribution like a mode 
or hpd region. While in some technical sense this 
seems like using the data twice, there does not seem 
to be anything wrong with it, at least to us. Rather 
than giving a definition, this paper, like Bayarri and 
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Berger (2000) and Robins, van der Vaart and Ven- 
tura (2000), points to a negative consequence of dou- 
ble use of the data, in terms of the lack of unifor- 
mity of p-values. Perhaps the factorization in Sec- 
tion 2 of this discussion gives a general method of 
ensuring that components of the total information 
available to a statistician for an analysis are used ap- 
propriately, and so gives a general characterization 
for avoidance of "double use of the information." 

This paper assumes a default or "objective" prior 
on the last level of a hierarchically specified prior. 
In general this will result in an improper prior. Part 
of the motivation for this seems to be that "model 
checking with informative priors cannot separate in- 
adequacy of the prior from inadequacy of the model" 
and so the methodology proposed by Box (1980), 
which is based on proper priors, is not used. We 
disagree with the quoted statement. The methods 
discussed in Evans and Moshonov (2006, 2007) are 
a modification of Box's approach and are motivated 
precisely by the need to separate the two kinds of 
inadequacies in the context of proper, informative 
priors which, as they should, represent subjective 
beliefs. We briefly outline this approach in Section 2. 
Also, Evans and Moshonov (2006) includes method- 
ology for checking the second level of a hierarchical 
model based on a factorization of the full informa- 
tion. We discuss this in Section 3 and show that this 
methodology is also applicable when the first level 
is improper. 

While we agree with the necessity to consider im- 
proper priors as part of a general theory of statis- 
tics, it is difficult for us to accept these basis 
from which statistical theory is built. It is our opin- 
ion that the core of statistics is represented by the 
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proper prior context. As such, we feel that what is 
done outside of this core should be highly influenced, 
if not directed, by the central theory with proper 
priors. So our discussion reflects this and considers 
the implications for the situation discussed in this 
paper. 

For us checking the sampling model and the prior 
are important parts of a statistical analysis. A com- 
mon complaint concerning the prior is that it is sub- 
jective, as it represents someone's personal beliefs 
about the true value of 9. A common retort is that 
the sampling model is also subjective as it represents 
someone's belief that the true distribution is in this 
class, that is, it was someone's subjective choice. Of 
course, both these statements are correct as there is 
typically little "objective" about either choice. From 
another point of view, the fact that these choices are 
subjective is a good thing because they are (hope- 
fully) informed choices and that should lead to bet- 
ter statistical analyses than if we made these choices 
arbitrarily, or based on convention. For us the way to 
reconcile the debate between objective and subjec- 
tive is through checking that these ingredients make 
sense in light of what we know to be truly objective 
(at least if it is collected correctly), namely, the data. 
Others argue that no such checks should be made, 
as they lead us to be incoherent. There is a wide di- 
versity of opinion on these matters and we certainly 
acknowledge value in various points of view. 

2. FACTORING THE FULL INFORMATION 

Suppose we have prescribed a sampling model 
{Pg:9 £ 6}, a proper prior n, and have observed 
the data x. The sampling model and prior com- 
bine to give the joint model Pg x II for (x,9). We 
will suppose that this joint model and the observed 
data comprise the full information available to the 
analyst. We are not saying that further information 
may not be available in an analysis, but we will re- 
strict our discussion to the situation where this is all 
we have. Further, denote the prior predictive mea- 
sure by M(B) = Jq Pg{B)U(d9); for statistics T and 
U oT on the sample space let Mt(-\U oT) denote the 
conditional prior predictive distribution of T given 
U oT, and II(-|x) denote the posterior of 9. 

In Box's approach to model checking, the observed 
value of x is compared with M to see if there is 
model failure, that is, we check to see if x is a sur- 
prising value from M. There would appear to be an 
illogicality involved in this, however, as we know, 



at least in the subjective Bayesian context, that x 
was not generated from M. If our assertion was that 
x was generated from M, perhaps as a random ef- 
fects model, then it would make sense to check x 
against M, as this is an assertion about the under- 
lying data generating mechanism. It is clearly more 
appropriate, in Bayesian context, however, to see if 
x is not surprising for at least one of the distribu- 
tions in {Pg : 9 G 0}, that is, check x against what we 
are asserting is the data generating mechanism — the 
sampling model. 

As discussed in Evans and Moshonov (2006), there 
are two possibilities for failure in the Bayesian for- 
mulation: the sampling model may fail by x be- 
ing surprising for each distribution in the sampling 
model or, if the sampling model does not fail, the 
prior may conflict with the data by placing the bulk 
of its mass on those distributions in the sampling 
model for which the data is surprising. Note that it 
only makes sense to talk about prior-data conflict if 
the sampling model does not fail. Logically, checking 
the sampling model precedes checking for prior-data 
conflict. 

How then should we check for prior-data conflict? 
Intuitively this arises when the effective supports of 
the likelihood and the prior do not overlap. As dis- 
cussed in Evans and Moshonov (2006), however, the 
clearest approach to measuring this conflict comes 
from asking if the observed likelihood is a surprising 
value from its prior predictive distribution. Given 
that the likelihood map is minimal sufficient, this 
is equivalent to asking if the observed value T(x) 
of a minimal sufficient statistic T is surprising from 
its marginal prior predictive My. Further consider- 
ation shows that T{x) can be surprising simply be- 
cause some value U (T(x)) is surprising where U o T 
is ancillary. When such ancillaries exist, this leads 
to comparing T(x) to Mt(-\U o T) where U o T is a 
maximal ancillary, as this conditioning removes the 
maximal amount of ancillary variation. Ancillary 
variation is clearly not relevant to assessing prior- 
data conflict as it does not depend on the param- 
eter. Further, there is nothing to prevent us from 
using some function S(T), and comparing its ob- 
served value to the distribution Mgrj^(-\U oT), to 
check for prior-data conflict. Of course, S has to be 
chosen sensibly if we are going to make a meaningful 
check. 

This approach leads to the following factorization 
of the joint distribution: 

Pe^U 
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(1) 

= P(-\T) x P UoT x M T (-\U o T) x n(-|x), 

where P{-\T) is the conditional distribution of the 
data given the minimal sufficient statistic T, and so 
does not involve 9, and PjjoT is the marginal distri- 
bution of PjjoT which is also free of 9. Each of the 
components in (1) plays a separate role in a statisti- 
cal analysis. P(-|T) and PjjoT are available for check- 
ing the sampling model, Mt(-\U oT) is available for 
checking for prior-data conflict and n(-|x) [which re- 
ally only depends on the data through T(x)} is for 
inference about 9. We see that M = P(-\T) x P\j t X 
Mt(-\U o T), which explains how this is a modifica- 
tion of Box's approach and it shows how to check 
for inadequacies in the prior as well as the sampling 
model. 

It is our claim that effectively (1) shows us how to 
proceed to avoid double use of the information and, 
as such, avoid double use of the data. Of course, 
as mentioned in the paper, it may be difficult, with 
complicated models, to determine P(-\T) or PfjoT 
in meaningful ways. Accordingly, it seems reason- 
able to weaken this requirement in such contexts to 
having this hold asymptotically in some sense. For 
example, a chi-squared goodness-of-fit test is asymp- 
totically ancillary. 

In the context of an improper prior that leads to 
a proper posterior, then (1) is still available but now 
the factor Mt{-\U o T) is not a probability measure 
and so it is not clear how we would check for prior- 
data conflict. As discussed in Evans and Moshonov 
(2006, 2007), a partial characterization of a nonin- 
formative prior is that it would never lead to ev- 
idence of a prior-data conflict existing no matter 
what data is obtained. Thus the choice of an im- 
proper prior is an assertion that this choice avoids 
such a conflict. Noninformative sequences of priors 
are also discussed in Evans and Moshonov (2006, 
2007) and these can provide a way to justify such 
a statement for a particular improper prior. In any 
case, the choice of an improper prior should not in 
any way change the role of the remaining factors if 
we follow the principle that the proper case is cen- 
tral. Although we do not have a formal proof, it 
would seem that the methods discussed in Bayarri 
and Berger (2000) will satisfy this asymptotically. 

Further, any p- values computed according to this 
factorization will have the necessary uniform prop- 
erties when assessed against the appropriate mea- 
sures. For example, if p(t) = Mx(h(T) > h(t)) is a 



p-value for checking for prior-data conflict with no 
ancillary, then p(T) will be uniformly distributed, 
at least in the continuous case, when T ~ Mt- 

3. HIERARCHICAL MODELS 

In Evans and Moshonov (2006, 2007) methods are 
discussed for checking hierarchically specified priors 
for 9 = (9±,9 2 ) £ 01 x ©2, that is, we specify pri- 
ors III and n 2 so that n(d(0i,0 2 )) =U 2 (d0 2 \9 1 ) x 
H\(d9i). In such situations we would like to check 
the individual components of the prior separately, as 
this gives us more information about a prior-data 
conflict when this occurs. For example, it may be 
that IIi conflicts but II2 does not. 

We distinguish two different situations. First, the 
parameters 9\ and 9 2 may both be part of the likeli- 
hood function and second, only 9 2 is part of the like- 
lihood function. The second situation corresponds 
to hierarchical models and 9\ is a hyperparame- 
ter. Methods are presented in Evans and Moshonov 
(2006, 2007) for both of these situations, but we only 
discuss hierarchical models here. 

With proper priors we have the prior n^d^) = 
Jq U 2 (d9 2 \9i)Ili(d9i) for the model parameter and 
the methods of Section 2, based on the minimal 
statistic T for the model {Pq 2 :9 2 € 2 }, are avail- 
able to check whether or not Ilg conflicts with the 
data. While this check is available, Evans and 
Moshonov (2006) develop a factorization that is ap- 
propriate for checking the components, such as the 
second level II2 (• I ^1 ) 5 of a hierarchical model. 

To simplify the presentation of this, we will sup- 
pose there are no relevant ancillaries for {Pe 2 '■ $2 6 
^2} based on T, but note that these can be incor- 
porated as well. We can formally generate another 
model for x from the joint distribution, namely, via 

M 9l (dx)= f p e2 (rfx)n 2 (d0 2 |^i) 

Jn 2 

= P(dx\T)(t) [ P Te2 (dt)n 2 (^ 2 ^i) 

JQ 2 

= P(dx\T)(t) x M T6l (dt). 

This model is only formal, as, indeed, our model in- 
dicates that x was not generated via Mq 1 , for some 
value of 9\. Here Mg 1 is the conditional prior pre- 
dictive distribution for x given 9\ and Mtb 1 is the 
conditional prior predictive distribution for T given 
9\. Note that when Il 2 (-|#i) is proper, as in the pa- 
per, then Mq x and M?e 1 are also proper. 
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Let V(T) be a minimal sufficient statistic for the 
formal model for T given by {Mxe 1 :0i G fii}. We 
can factor M T9l as M T (-|V) x My 6l , where M T (-|V) 
is the conditional prior predictive distribution of T 
given V, and My^ is the conditional prior predictive 
distribution of V given 9\. Then the joint distribu- 
tion of (Oi,x) can be factored as 

(2) P(-\T) x M T (-\V) x My x ni(-|V), 

where My is the prior predictive distribution of V 
and IIi(-| V) is the posterior distribution of 9\. 

Consider how each of the factors in (2) is to be 
used. First P(-\T) is available for checking the ba- 
sic sampling model {Pg 2 :02 G ^2}- If no evidence 
is found against {Pg 2 i^e^}, we can proceed to 
check the formal model {Mxe 1 '■ 9\ G Qi} for T using 
Mt(-\V) and note that this does not depend on LTi. 
Note also that Mt(-\V) is proper whenever Il2(-|0i) 
is proper for each value of 9\. If evidence is found 
against this model, then, because we have accepted 
the sampling model, and so consequently the model 
{Pt6 2 '■ @2 G ^2} for T, this must occur because of a 
conflict between the observed value T(x) and LT2. So 
a check of the formal model {Mj^ ■ 0i G Oi} using 
Mt(-\V) is a check for prior-data conflict with LT2. 
Note that this check proceeds exactly as in the sim- 
pler situation described in Section 2. If we find no 
evidence against {Mxe 1 : 0i G ^1}, then we can check 
for a conflict with LTi using My. Finally, if there is 
no conflict with LTi, then IIi(-|V) is available for in- 
ference about 9\. Of course, if there is no conflict 
with IIi and II2, then we can also make inference 
about the parameter of interest 02. 

The model {Mye-i G Hi} may have ancillaries. 
Let W o V be such a maximal ancillary. We then 
have that My factors as My = MwoV x My(-|W o 
V) so that (2) becomes 

P(-\T) x M T (-\V) 

(3) 

x M WoV x M v (-\WoV) x IIi(-|T/). 

In this case, given that we have accepted the sam- 
pling model, the factor Mw Q y is available for check- 
ing for prior-data conflict with II2, and My (-| Wo V) 
is the appropriate factor for checking IIi. The jus- 
tification for this is exactly as in the simple case 
discussed in Section 2. 

Note that in (3), the only distribution that will 
necessarily be improper when H~i is improper, is 
M V (-\W oV). The measure M v (-\WoV) is to be 
used only in the check for IIi. Therefore, the choice 



of an improper IIi is really an assertion that this 
prior will never conflict with the data. Irrespective of 
whether or not LTi is improper, the factors Mt(-\V) 
and M\y v are available to check for prior-data con- 
flict with II2, when it is proper. 

We consider the implementation of this approach 
in the normal-normal hierarchical model presented 
in the paper. 

Example [Normal-normal hierarchical model). 
We first consider a simpler model. In particular, we 
assume that the known of are all equal to a 2 and 
that we have balance, namely, n\ = ■ ■ ■ = nj = n. 
For this problem we have that T{x) = (x±, . . . , xj)' ~ 
Nj(9, (o~ 2 /n)I) and here 9 is the model parameter 
(corresponding to 92 in our parameterization of a 
hierarchical model above). Therefore, according to 
our factorization, we check the sampling model us- 
ing P(-\T), which is effectively the distribution of 
the residuals. 

Now 

(xi, . . .,£/)' = (0i,.. .,61)' + (ct/v^Oi, ■ ■ • , Zi)' 

where the Z{ are i.i.d. N(0, 1) and, from the sec- 
ond level, . . . ,9i)' ~ Nj(ij1,t 2 I), independent 
of (zi, . . . , zi)' . Thus (/i,T 2 ) is the hyperparameter 
(corresponding to 9\ in our parameterization of a hi- 
erarchical model above). This implies that M T ^ T 2^ 
is given by (xi, . . . ,xj)' ~ iYr(/il, (r 2 + a 2 /n)I). It 
is then easy to see that V(x\, . . . , xj) = (J2i=i x ii 
J2i=i x 2 ) is& minimal sufficient statistic for the model 
{M T(A1T 2) ifj, G B},t 2 > 0}. Note also that V is a 
complete minimal sufficient statistic so there are no 
relevant ancillaries W that we need consider for the 
check for the second level. 

To determine Mt{-\V) we need the conditional 
distribution of (x\, . . . , xj)' given (X)i=i x iiHi=i 
This is clearly uniform on the sphere of squared ra- 
dius J2i=i %i ly m g m t ne hyperplane of R 1 given 
by {(yi,---,yi)'-J2i=iyi = Ef=i^}- We can simu- 
late from this distribution by generating v\, . . . , U7-1 
i.i.d. iV(0,l), putting u i =v i /{Y J {zlv 2 ) 1/2 and 

(yi, ■ ■ ■ , yi)' = (xi, . . . , xi)' + A(u x 

where A G i? /x ^~ 1 ^ is such that the matrix (l/y/l A) 
is orthogonal. Then for any particular discrepancy 
statistic, we can compute an appropriate p- value via 
simulation. 

The above analysis also applies when the cf/rii 
are all equal. When they are not equal the analysis 
is more complicated, as the form of V depends on 
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which ones are equal. Further, it is not a complete 
minimal sufficient statistic and so there are relevant 
ancillaries. 

Based on the factorization (3) we feel that Mt(-\V) 
and M\YoV are appropriate distributions for com- 
puting p-values to assess the second level for a hier- 
archical model. Further, the uniformity of the corre- 
sponding p-values should be assessed against these 
distributions and this does not require that Ii\ be 
improper. 

It is difficult to compare our approach with the 
proposal in the paper, but we note that it has the 
distinct advantage of not involving the prior for the 
first level. For our check on the second level we need 
say nothing about the prior for the first level and 
it can be improper. The intuition for this lies with 
conditioning on V, which completely removes the 
effect of rii on the prior predictive for T, and the 
fact that II2 induces the ancillary W o V. Therefore 
any conflict that is found can only be due to II2. 
It may be that the method proposed in the paper 
will satisfy (3) in an asymptotic sense but we do not 
have a proof of this. 

4. CONCLUSIONS 

It is sometimes suggested that model checking is 
a somewhat informal process. Partly this is because 
models can fail in many ways and some of these may 
be more relevant in certain situations than others. It 
seems impossible then to come up with a method- 
ology that will check for all of the possibilities si- 
multaneously. So it seems reasonable to ask that we 
specify a set of checks that we think are relevant, 
prior to seeing the data, and then implement only 
these, rather than going on a hunting expedition for 
defects. A similar approach seems appropriate for 
checking for prior-data conflict. 

While selection of the actual checks is perhaps 
somewhat informal, we do not believe that there is 
complete freedom in this. Some general principles 
must apply. The ill effects of double use of the data, 
as discussed in this paper and Bayarri and Berger 
(2000), provide a good example of the need for such 
principles. 

In frequentist statistical theory, inference about 
parameters depends on the data only through the 
minimal sufficient statistic and, what is left over in 
the data (the residual), is available for model check- 
ing. Mixing these up would seem to correspond to 
an inappropriate statistical analysis. We believe this 
is equally applicable in Bayesian formulations. 



Checking for prior-data conflict seems to sit be- 
tween model checking and inference. While it de- 
pends on the minimal sufficient statistic, however, 
the factorization given by (1) indicates that it re- 
ally is separate from model checking and inference 
as it involves a separate component of the full in- 
formation as expressed by the joint distribution. In 
essence (1) prescribes how each component of the 
full information is to be used in a statistical analy- 
sis. If we mix these up, it would seem to us that we 
can expect illogical or incoherent behavior, for ex- 
ample, overly conservative p-values. Note that in a 
certain sense each component of (1) is independent 
of the others, as we could prescribe each probabil- 
ity measure separately and still end up with a valid 
joint distribution. Specification of each component 
of (1) is necessary and sufficient for the specification 
of a joint probability distribution for (x,9). 

Of course, this restriction could be weakened to re- 
quiring that a methodology only satisfy (1) in some 
asymptotic sense. The motivation for this would seem 
to arise from the complexity of some situations. Still, 
(1) can be implemented exactly with many models 
of considerable importance, so it is not just of theo- 
retical relevance. 

Similarly, we believe that (3) is the relevant fac- 
torization for model checking and checking for prior- 
data conflict in hierarchical models. From that per- 
spective it would be important to see if the methods 
proposed in the paper satisfied this in some asymp- 
totic sense. This would give us more confidence that 
these constituted an appropriate way to proceed in 
situations where they were felt to be necessary. 

We also feel that our discussion of (3) shows that 
the choice of prior IIi for 6\ is irrelevant for checking 
II2 with hierarchical models. In particular, whether 
IIi is proper or improper, the check for II2 is the 
same and this is a satisfying result. This does not 
appear to be the case for the method proposed in the 
paper which depends, in particular, on which objec- 
tive prior we use. Perhaps this effect disappears as 
the amount of data increases, but then the relevance 
of checking for prior-data conflict disappears too, as 
the effect of the prior on inference disappears, at 
least under reasonable regularity conditions. 

Overall, our purpose here is to suggest that there 
is a principled approach to the question addressed 
in the paper. We are not saying that using the par- 
tial posterior approach is in some way incorrect. We 
do think, however, that it would be worth investi- 
gating to what extent the partial posterior approach 
satisfied (3). 
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